per 



WORLD INTELLECnJAL PROPERTY ORGAhOZATION 
Internationa] Bureau 




INTERNATIONAL APPUCATION PUBUSHED UNDER THE PATENT COOPERATION TREATY (PCI) 



(51) lotematioiia] Patent CtesiGcaticm ^ : 
GOSFU/OS 



Al 



(11) Intemationa] Publication Nnrnber: WO 93/1S459 

(43) tntematicHial Pbblication Date: 16 September 1993 (16.09.93) 



^1) Intematioi^ AppUcation Number: PCT/US93/01814 
(22) Intemational Fnbg Date: 3 Marcli 1993 (03.03^3) 



(30) Priority data: 
07/847300 



6 March 1992 (06.03.92) 



US 



(71) Applicant: RAMBUS INC. [US/US]; 2465 Latham Street, 

Mountain View, CA 94040 (US). 

(72) Inventors: KRISHNAMOHAN, Kamamadakala ; 3168 

Areola Court, San Jose, CA (US). FARMWALD, Paul, 
Michael ; 190 Golden Oak Drive, Portola VaUey, CA 
94028 (US). WARE. Frederick, Abbott ; 13961 Fremont 
Pines, Los Altos, C A 94022 (US). 



(74)Agents: VINCENT, Lester, J. et aL; Blakely, Sokoloff, 
Taylor & Zafman, 12400 Wilshire Boulevard, 7th Floor, 
Los Angeles, CA 90025 (US). 



(81)Deagoated States: AT, AU, BB, BG, BR, CA, CH, CZ, 
DE. DK, ES, n, GB, HU, JP, KP, KR, LK, LU, MG, 
MN, MW, NL, NO, NZ. PL, PT, RO, RU, SD, SE, SK, 
UA, European patent (AT, BE, CH, DE, DK, ES, FR, 
GB, GR, IE, IT, LU, MC, NL, PT, SE), OAPI patent 
(BF, BJ, CF, CG, CI, CM, GA, GN, ML, MR, SN, TD, 
TG). 



Published 

With international search report 



(54) Title: PREFETCHING INTO A CACHE TO MINIML2E MAIN MEMORY ACCESS TIME AND CACHE SIZE IN A 
COMPUTER SYSTEM 



<-fc± 



15 



PROCESSOR 
INIGRFACE 
210 



220 



MUX 
260 



58 



Opto 

230 



20 



62 



70 



^1 

68 



MAIN 
CACHE 
240 



76 



MEMORY 
ff^fTERFACE 
2S0 



.HfGHSPSD 
MEMORY BUS 



(57) Abstract 

A cadie memory subsystem with a relatively small main cache and a relatively high hit rate. Unpredictable data is stored in 
a main cache. Predictable data is stored in a set of instruction and data prefetch buffers. A cache subsystem with stride prediction 
hardware is also described. 
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PREFETCHING INTO A CACHE TO MINIMIZE MAIN MEMORY 
ACCESS TIME AND r ArwP qTTF TM A rOM PUTER SYSTEM 

FIELD OF THE INVENTION: 

The present invention pertains to the field of computer 
ardiitecture. More particularly, this invention relates to cache 
memories systems for iinprovingzdate^aecessitiiiies, 
BACKGROUND OF THE INVENTION: 

Using dynamic random access memories ("DRAMs") for a high 
performance main memory for a computer system is often less 
expensive than using static random access memories ("SRAMs"). 
Nevertheless, DRAMs are typically much slower than SRAMs. 

A common technique for lessening the impact of slow DRAM 
access time on main processor performance is to employ a cache 
memory. A cache memory is a limited size fast memory, usually made 
up of SRAMs, which stores blocks of data, known as lines, that reflect 
selected main memory locations. A cache memory is smaller than the 
main memory it reflects, which means the cache memory typically is 
not fully addressable and must store a tag field for each data line. The 
tag field identifies the main memory address corresponding a 
particular data line. 

When the main processor issues a read request and an address 
corresponding to desired data stored in main memory, the cache 
memory is checked by comparing the received address to the tag fields 
of flie cache memory. If the desired data is stored in the cache, then a 
"hit" occurs and the desired data is immediately available to the main 
processor. If the desired data is not stored in the cache, then a "miss" 
occurs, and the desired data must be fetched from slower main 



wo 93/18459 



PCr/US93/01814 



memory. The typical goal in a cache memory design is to increase the 
hit rate because a low hit rate slows main processor performance. 

One prior technique for increasing tiie hit rate in a cache 
memory subsystem is to use a prefetch buffer along with a main cache. 
A prefetch buffer is a memory that stores data prefetched from main 
memory. Data is speculatively prefetched into the prefetch buffer 
before a next read request based upon a prediction of the address for the 
next read request. When ttie main processor issues the next read 
request, the desired data may be available from the prefetch buffer if 
the prediction was accurate. In typical prior art systems, if the 
prediction was correct, the desired data is moved from flie prefetch 
buffer to the main cache and is supplied to the main processor. 

Nevertheless, prior art prefetch schemes that store prefetched 
data in the main cache often require relatively large main cache 
memories in order to maintain a high hit rate because the main cache 
typically becomes cluttered with predictable addresses, which are 
tj^ically sequential. Unfortimately, larger cache memories increase the 
cost of the computer system and often preclude placement of effective 
caches on-chip witii ttie main processor. 
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_3. 

SUMMARY AND OBTECTS OF THE INVENTION 

One object of the present invention is to gunimize-dataracoess 
timejnza-Gompii^.sj^em. 

Another object of the present invention is to provide a 
relativdy-effioOTt eadie-bridgeitozarmain-m^toi^^^ 

Another object of the present invention is to ^miid niize-tKe^ize 
of the cac^e megijMyiand ^to-miniinize jiim 

Another object of the present invention is to provide a 
computer system with a relativelyjrsmaHiia die-memor vri with a 
relatively high::lutrate:fliat:is:^i^iarableiwifc of larger 

cache memories. 

A further object of the present invention is to provide an 
efficient cadie memory subsystem suitable for placement on a 
microprocessor chip. 

These and other objects of tiie invention are provided by a 
mediod and apparatus for reducing main memory access time in a 
computer system. In the cache memory subsystem of the present 
invoition, an address corresponding to a data line stored in the main 
memory is received from a main processor, along with a read request. 
Tlie address is received by an instruction prefetch buffer, a data prefetch 
buffer, and a main cache. If a hit occurs on one of the prefetch buffers, 
the data line is read from tiie prefetch buffer and fransferred to flte 
main processor. If a main cache hit occurs, then the desired data is read 
from tiie main cache and transferred to the main processor. If a main 
cadie miss and prefetch miss occurs, then the desired data is fetched 
from main memory, stored in the main cache, and transferred to the 
main processor. In all cases (hit or miss), after ttie desired data has 
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been transferred to the main processor, a predicted address is generated. 
A next data line stored at the predicted address in fee main memory is 
then fetched from fee main memory and stored in the appropriate 
prefetched bufier. As a result, only data at unpredictable addresses are 
stored in the main cacha Data at predictable addresses do not clutter 
the main cache, but are instead stored in fee instruction and data 
prefetch bufers. 

Ofeer objects, feahires and advantages of fee present invention 
will be apparent from fee accompanying drawings, and from fee 
detailed description that follows below. 
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BRIEF DESCRIFnnM r>P T HE DRAWTXtnc 

The present invention is iUustrated by way of example and not 
limitation in the figures of the accompanying drawings in which like 
references indicate similar elements, and in which: 

Figure 1 is a block diagram of a computer system employing a 
separate cache subsystem; 

Figure 2 shows a computer system with a cache subsystem on a 
processor chip; 

Figure 3 is a functional block diagram iUustrating the address 
and data paths for one cache memory subsystm; 

Hgure 4 is a logical flow diagram of the method employed by the 
cache memory subsystem; 

Figure 5 is a block diagram of the hardware of a cache memory 

subsystem that employs a prefetch assist unit; 

Kgure 6 is a logical flow diagram of the method employed by the 

cache memory subsystem that has a prefetch assist unit- 
Figure 7 illustrates stride prediction hardware of a cache 

subsystem. 
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nPTAn.K D DESCRiPTrnNj 

Rgure 1 is a block diagram of the ardutecture of computer 
system 5. Computer system 5 includes processor 10 for transferring 
address, data, and control information to and from cache subsystem 20 
over bus 15. Cache subsystem 20 transfers information to and from 
main memory 40 over hi^ speed bus 30. For tiie embodiment shown 
in Kgure 1> cache subs3rstem 20 comprises circuitry external to 
processor 10. Cache subsystem 20 is functionally transparent to 
processor 10. In oflier words, processor 10 issues read requests and 
addresses over bus 15 as if processor 10 were directiy connected to main 
memory 40. 

As described in more detail below, cache subsystem 20 helps to 
maximize cadie hits while minimizing cache space. 

Kgure 2 shows another embodiment of the present invention 
wherdn cadxe subsystem 21 of computer system 7 resides wifliin 
processor chip 11. Processor 11 is coupled to cache subsystem 21. Cache 
subsystem 21 is in turn coupled to main memory 41 via bus 31. 

Figure 3 shows cache subsystem 20 of Figure 1. Cache subsystem 
20 is comprised of processor interface 210, instruction prefetch buffer 
220 Opfb), data prefetch buffer 230 (Dpff,), main cache 240, and memory 
mterfiace 250. Processor interface 210 is coupled to received addresses 
from processor 10 over bus 16. Processor interface 210 transfers data to 
and from processor 10 over data bus 17, and transfers control 
information to and from processor 10 over control bus 18. Buses 16, 17, 
and 18 are part of bus 15 shown in Figure 1. 

Ipfb 220 prefetches and buffers instructions for processor 10. Ipfl, 
220 is coupled to receive instruction addresses from processor interface 
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210 over address path 52, and transfer instructions to processor 
interface 210 over data path 56. Ipft, 220 is coupled to transfer predicted 
instruction addresses to inemory interface 250 over address path 62 and 
receive next instruction lines over data path 64. 

In a similar manner, Dpfl, 230 prefetches and buffers data for 
processor 10. Dpfl, 230 is coupled to receive addresses from processor 
interface 210 over address pafli 54 and transfer data to processor 
interface 210 over data path 58. Dpfl, 230 transfer predicted data 
addresses to memory interface 250 over address path 66 and receives 
next data lines over data pafli 68. 

Main cache 240 holds unpredictable instructions not prefetched 
byIpfb220andunpredictabledatanotprefetchedbyDpft,230. Main 
cache 240 receives, flu-ough multiplexer 260, either instruction 
addresses 52 or data addresses 54 from p„>ces«>r interface 210. m case 
of a main cache 240 miss, main cache 240 fransfers addresses 74 to 
memory interface 250 and fransfers data 76 to and from memoiy 
interface 250. 

Figure 4 is a flow diagram of a method employed by cache 
subsystem 20. At block 100, a read request is received by cache 
subsystem 20 from processor 10. Processor interface 210 receives the 
desired address over address bus 16 and receives a read request signal 
over confrolbusm Tlte received address is routed (1) to Ipft, 220 over 
address path 52, and (2) to Dpfl, 230 over address path 54, and (3) to 
main cache 240 through multiplexer 260. A signal on control bus 18 
indicates whether the read request is for an insfruction fetch or a dafa 
fetch. 
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If control signals 18 indicate an instruction fetch, the instruction 
prefetch buffer is checked for the desired instruction. This occurs when 
processor interfiace 210 transfers instruction address 52 to Ipfb 220. On 
the other hand, if control signals 18 indicate a data fetch, then the data 
prefetch buffer is checked when processor 210 transfers data address 54 
to Dpfb 230. The received address - - aflier instruction address 52 or 
data address 54 - - is transferred to main cache 240 through multiplexor 
260. 

At decision block 110, a prefetch buffer or main cache hit is 
sensed. If a prefetch buffer hit occurs, tiien control transfers to block 
120, wherein the desired Une, either instruction or data, is read ftom 
the appropriate prefetch buffer, either Ipfl, 220 or Dpfb 230. The 
prefetch buffer is also "popped" to make room for prefetched 
instructions or data. Alternatively, the prefetch buffer may not be 
popped if a larger prefetch buffer is employed. The desired line is then 
transferred to processor interfece 210 over the appropriate data path, 
either data path 56 or 58. Thereafter, at block 130, processor interfece 
210 transfers the desired line to processor 10 over data bus 17. 

At block 180, a next sequential line is prefetched from main 
memory 40 into the appropriate prefetch buffer. In the case of an 
instruction fetch indicated by control signals 18, a next sequential 
instruction address 62 is transferred from Ipfb 220 to memory interface 
250. Memory interface 250 accesses main memory 40 over high speed 
bus 30 and transfers next sequential instruction line 64 to Ipfb 220. In 
the case of a data fetch indicated by control signals 18, a next sequential 

data address 66 is transferred to memory interface 250 and next 
sequential data line 68 is transferred from memory interfece 250 to 
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Dpfb 230 after being fetched from main memory 40. Control then 
proceeds to block 190, which ends the read request sequence. 

If a main cache hit occurs at decision block 110, then control is 
transferred to block 150, wherein the desired line is read from main 
cache 240 and supplied to processor 10. The desired line is transferred 
from main cache 240 to processor interface 210 over data path 70, 
Control then proceeds to block 180, wherein a next sequential line is 
prefetched as discussed above. 

If a main cache or prefetch buffer hit does not occur at decision 
block 110, then control transfers to block 160, wherein the "missed" 
line is fetched into main cache 240. Address 74, which is either the 
received instruction address 52 or data address 54, is transferred to 
memory interface 250. After accessing main memory 40, memory 
interface 250 transfers the desired line 76 to main cache 240. The 
d^ed line is stored in main cache 240 and transferred to processor 
interface 210 over data path 70. Processor interface 210 transfers the 
desired line to processor 10 over data bus 17. Control then proceeds to 
block 180, wherein a next sequential line is then prefetched as discussed 
above. 

Figure 5 illustrates cache system 22, which is anotiier 
embodiment of the present invention. Cache subsystrai 22 employs a 
prefetch assist cache (TAC"). Cache subsystem 22 functions as a cache 
bridge between processor 10 and a high performance memory system 
40, An example of a high performance main memory system 40 is set 
forth in PCT international patent application number PCT/US91/02590 
ffled April 16, 1991, published October 31, 1991, and entitled Inte grated 
Circuit I/O Using a High Performance Bus Interface . 
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Cadie subsysten 22 is coupled to processor 10 via bus 15. Cache 
subsjrstem 22 is coupled to main memory 40 via bus 30. Cache 
subsystem 22 is located between processor 10 and main memory 40. 
Cache subsystem 22 is "transparent" to processor 10. Cache subsystem 
22 indudes instruction prefetch unit 510, data prefetch unit 530, 
prefetch assist imit 520, main cache imit 540, and control unit 550. 
Address latch 502 receives addresses 16 from processor 10. Address 
latch 502 transmits address signals 516 to instruction prefetch unit 510, 
data prefetch unit 530, prefetch assist unit 520, and main cache unit 540. 
Address signals 516 are also received by increment register 570 and 
multiplejor 560. Data is transferred between processor 10 and 
instruction prefetch imit 510, data prefetch unit 530, and main cache 
xmit 540 over bus 17. 

fiistruction prefetch unit 510 is comprised of a control section 
Q'CFB CTL") 631 and a data storage section 632 comprised of (1) data 633 
and (2) tags and compare (TAGS & CMP") section 634. Control section 
631 communicates witii control unit 550 over signal lines 552. Data 
storage section 632 is organized as a set of four fully associative buffers 
of size 16 bytes each. On an instruction prefetch hit, die desired line is 
supplied to processor 10, and the entry is "popped" to make room for a 
new prefached line. Entries are rq)laced oil a least recendy used basis. 

Similarly, data prefetch unit 530 is comprised of a control section 
("DFB CTL") 641 and a data storage section 642 comprised of (1) data 643 
and (2) tags and compare ("TAGS & CMF') section 644. Control section 
641 communicates with control unit 550 over signal lines 553. Data 
storage section 643 is organized as a set of four fully associative buffers 
of size 16 bytes each. On a data prefetch hit, ttie desired line is supplied 
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to processor 10 and the entry is "popped" to make room for a new 
prefetched line on a least recently used basis. 

Main cache unit 540 includes a control section ("CACHE CTL") 
651 that communicates with control unit 550 over signal lines 554. 
Main cache unit 540 also includes data storage section 652 comprised of 
(1) data 653 and (2) tags and compare ('TAGS & CMF') section 654. 
Data storage section 652 is organized as an 8 kflobyte 4-way set 
associative unified cache with least recently used replacement. 

Prefetch assist unit 520 is comprised of (1) a control section 
(TAC CTL") 661, (2) a data section 652 comprised of previous tags 
(TREV TAGS") 653 and next tags ("NEXT TAGS") 654, (3) a last code 
address register ("CAR") 655, and (4) a last data address register 
("DAR") 656. Data section 652 is organized as a 256 entry 4-way set 
associative cache with least recently used replacement. 

CAR 655 and DAR 656 are used in conjunction with de$ired 
address 516 to create a new PAC entry in the data section of prefetch 
assist unit 520. A PAC entry in the data section is comprised of a 
predicted address ("NEXT TAG") and an associated tag field (TREV 
TAGS") defined by the last instruction or data address. The CAR and 
DAR entries are created by storing desired address 516 in CAR 655 or 
DAR 656, dq)ending upon whether confa-ol signals 18 indicate an 
instruction read or a data read sequence. 

Cache subsystem 22 employs a relatively small main cache to 
achieve a relatively high hit rate and avoids having a much larger 
main cache. For example, when employed as a cache bridge to high 
performance main memory system 40 capable of h-ansferring 500 
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Megabytes/Kcond, cache subsystem 22 uses only an 8 kilobyte main 
cache. 

Figure 6 is a flow diagram illustrating the method employed by 
cache subsj^tem 22. At block 300, a read request is received by cache 
subs5rstem 22 from processor 10. This occurs when address register 502 
receives the desired address over address bus 16 and processor control 
unit 504 receives a read request over control bus 18. The desired 
address 516 is received by instruction prefetch imit 510, data prefetch 
unit 530, main cache unit 540, prefetch assist xmit 520, increment 
register 570, and multiplexer 560. 

At decision block 310, if control unit 550 senses an instruction 
prefetch buffier hit via bus 552 or a data prefetch buffer hit via bus 553, 
then control transfers to block 320, wherein the desired line - - either 
instructions or data - - is "popped" from the appropriate prefetch buffer 
and transferred to processor 10 over data bus 17. 

At decision block 360, if prefetch assist hit on bus 551 is sensed by 
control unit 550, then control is transferred to block 370, wherein 
control unit 550 issues mux control 555, which causes multiplexor 560 
to couple predicted address 521 to fetch address register 580. Control 
unit 550 also issues control signals 556 to signal a fetch cycle to memory 
control imit 586. Fetch address register 580 transmits fetch address 19, 
and memory control 586 transmits control signals 22 over high 
performance bus 30, whidi initiates a main memory 40 fetch cyde. The 
line corresponding to predicted address 521 is fetched from main 
memory 40. Line 20 is received by receive data register 582 and 
transferred to either instruction prefetch unit 510 or data- prefetch unit 
530 over data bus 17, under control of fill prefetch signal 552 or 553 
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issued by control unit 550, dqjending upon whether control signals 18 
indicate an instruction or a data read sequence. Control then proceeds 
to block 410, which ends the read request sequence. 

If prefetch assist hit on bus 551 is not sensed by control unit 550 
at block 360, then control proceeds to block 380, wherein increment 
register 570 generates next sequential line address 571. Control unit 550 
then issues mux control 555 to couple next sequential line address 571 
to fetch address register 580. Conbrol unit 550 also issues control signals 
556 to signal a fetch cycle to memory control unit 586. Fetch address 
register 580 transmits fetch address 619, and memory control 586 
transmits control signals 622 over high performance bus 30, which 
initiates a main memory 40 fetch cycle. The next sequential line is 
then fetched from main memory 40. Next sequential line 20 is 
received by receive data register 582. Next sequential Une 20 is then 
transferred to either instruction prefetch unit 510 or data prefetch unit 
530 over data bus 17, depending upon whether control signals 18 
indicate an insfaniction or a data read sequence. This is done under the 
conbrol of fill prefetch signal on bus 552 or bus 553 issued by control 
unit 550. Control then proceeds to block 410, which ends the read 
request sequence. 

If a main cache hit indicated on bus 554 is received at decision 
block 310, then control transfers to block 340, wherein the desired line 
is read from main cache unit 540 and suppUed to processor 10 over data 
bus 17. Conti-ol then proceeds to block 360, wherein prefetching is 
performed as discussed above. 

If a prefetch buffer hit or main cache hit is not received at 
decision block 310, then control defers to block 350, wherein flie 
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"missed" line is fetched into main cache tmit 540. This occurs when 
control xmit 550 issues control signals 555, which cause multiplexer 560 
to couple d^ired address 516 to fetch address r^;ister 580. Control tmit 
550 also issues control signals 556 to signal a fetch cyde to memory 
control unit 586. Fetch address register 580 transmits fetch address 619 
and memory control 586 transmits control signals 622 over high 
performance bus 30, which initiates a main memory 40 fetch cyde. The 
missed line is returned on bus 620, stored in receive data register 582, 
and transferred to main cache unit 540 and processor 10 over data bus 
17. Control tmit 550 then issues a fill main cache signal on bus 554, 
which causes main cache 540 to create a new entry. 

At block 390, a next sequential line is prefetched from main 
memory 40 into the appropriate prefetch buffer. As before, increment 
roister 570 gaierates next sequential line address 571, Control unit 550 
flien issues mux control 555 to couple next sequential line address 571 
to fetch address register 580. Fetch address register 580 transmits fetch 
address 619 and memory control 586 transmits control signals 622 over 
high performance bus 30, which initiates a main memory 40 fetch 
cyde. The next sequential line is fetched from main memory 40. Next 
sequential line 20 is received by receive data register 582. Next 
sequaitial line 20 is transferred to eitiier instruction prefetch unit 510 
or data prefetch unit 530 over data bus 17 under control of a fill 
prefetch signal on bus 552 or bus 553 issued by control tmit 550. 

At block 400, a new PAC entry is created when control xmit 550 
issues a create PAC signal on bus 551. The CAR and DAR of prefetch 
assist xmit 520 are xised in conjxmction with desired address 516 to 
create a new PAC entry in the data section of prefetch assist xmit 520. A 
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PAC entry is created in the data section by storing desired address 516 as 
tiie predicted address (NEXT TAG), along with an associated tag field 
(PREV TAGS) defined by the last instruction address CAR or data 
address DAR, depending upon whether control signals 18 indicate an 
instruction or a data fetch sequence. Control then proceeds to 410, 
which ends the read request sequence. 

Figure 7 illustrates an example of stride prediction hardware 
provided by another embodiment of cache subsystem 21 of Figure 2. A 
stride predictor 703 stores the last few program counter ("PC") and 
virtual address C*VA") pairs received from processor 11 during load 
instructions, and checks for re-occurrences of PC values corresponding 
to a load instruction. If a PC value re-occurrence is detected by the 
stride predictor 703, then the corresponding VA value ("VAmatch") is 
used to generate a predicted VA 795 for prefetching into a data prefetch 
buffer, such as Dpfb 220. 

For one embodiment, stride predictor 703 resides on processor 
diip 11. Registers 701 and 702 are embedded within processor 11 and 
contain ttie PC and VA values, respectively. For this example, the last 
5 PC and VA pairs are stored. 

Stride predictor 703 receives VA 781 and PC 783 firom processor 
11, along with signal 780, which indicates a load instruction, i.e. a load 
from memory or a memory read. In this example, the stride predictor 
703 is comprised of stages 735 through 739. For stage 735, register 710 
stores VA 781 and register 720 stores PC 783, both under control of load 
signal 780. For stage 736, register 711 stores VA 790 received from 
register 710 and register 721 stores PC 792 received from register 720, 
t)oth imder control of load signal 780. 
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Stages 736 through 739 function similarly to stage 735. Each time 
load signal 780 indicates a load instruction, the VA 781 and PC 783 
values are propagated and stored in stages 735 through 739 in a "iBrst in 
first out" manner. 

Comparator 750 flirough 754 compare the newest PC 783 to ttie 
PC output of each stage 735 through 739. For example, in stage 735 
comparator 750 compares PC output 792 of register 720 wifli newest PC 
783. If comparator 750 detects a match, then enable signal 791 causes 
driver 740 to couple ttieVA 790 from register 710 to VAmatch 782. Ina 
similar manner, comparator 751 tests the output of roister 721 for a 
match with newest PC 783- If a match is detected by comparator 751, 
then the output of register 711 is coupled to VAjnatch 782 by driver 
741. 

Each stage 735 through 739 tests the newest PC 783 to their stored 
PC value* ff any of the outputs of registers 720 through 724 match the 
newest PC 783, ttien the corresponding VA stored in registers 710 
flirough 714 is coupled to VAmatch 782. 

Math logic 730 receives VAmatch 782 and newest VA 781, and 
generates predicted VA 795 in accordance with the following equation: 

Predicted VA = newest VA + (newest VA-VA match ) - 2 x 
newest VA - VAmatch- 

Fbr example, math logic 730 may in one embodiment comprise a 
simple adder with one input comprising a logically left shifted newest 
VA 781 (2 X newest VA), and the other input comprising the two's 
complement of VAmatch 782. VA 795 can be used fbr prefetching into 
a data prefetch buffer, such as Dpfb 220. 
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In the foregoing specification, the invention has been described 
with reference to specific exemplary anbodiments thereof. It will, 
however, be evident that various modifications and changes may be 
made thereto wiftout departing from the broader spirit and scope of 
tiie invention as set forth in the appended claims. The specification 
and drawings are, accordingly, to be regarded in an illustrative rather 
than a restrictive sense. 
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CLAIMS 

What is daimed is: 

1. A method for minimizing memory access time in a computer 
sj^tem, comprising the steps of: 

(a) receiving an address corresponding to a data line stored in 
the main memoTy and receiving a read request from a main processor; 

(b) transferring tiie address to a prefetch buffer, and a main 

cache; 

(c) sensing a prefetdt buffer hit signal from the prefetch buffer, 
and a main cadie hit signal from the main cache, the prefetch buffer 
hit signal indicating whether the data line corresponding to the address 
is stored in the prefetch buffer, and the main cadie hit signal indicating 
whedier flie data line corresponding to the address is stored in the 
main cadie; 

(d) if the prefetch buffer hit signal indicates tiie data line is stored 
in the prefetdi buffer, reading the data line from ttie prefetch buffer, 
and transmitting the data line to ttie main processor, then proceeding 
to step (h); 

(e) if the main cache hit signal indicates the data line is stored in 
the main cache, reading tlie data line from the main cache, and 
transmitting ttie data line to the main processor, then proceeding to 
step (h); 

(f) transmitting the address to the main memory; 

(g) receiving the data line from the main memory, and 
transmitting the data line to the main processor; 
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(h) generating a predicted address, fetching a next data line 
corresponding to the predicted address from the main memory, and 
storing the next data line in the prefetch buffer. 

2. The method of claim 1, wherein step (g) further comprises the 
step of storing the data line in the main cache. 

3. The method of claim 1, wherein the step of generating the 
predicted address in step (h) comprises incrementing the address. 

4. A method for minimizing memory access time in a computer 
system, comprising the steps of: 

(a) receiving a read request from a main processor and an 
address corresponding to a data line stored in the main memory; 

(b) transferring the address to a prefetch buffer, a main cache, 
and a prefetch assist cache; 

(c) sensing a prefetch buffer hit signal from the prefetch buffer, 
and a main cache hit signal from the main cache, the prefetch buffer 
hit sig^ial indicating whether the data line corresponding to the address 
is stored in the prefetch buffer, and the main cache hit signal indicating 
whether the data line corresponding to the address is stored in the 
main cache; 

(d) if the prefetch buffer hit signal indicates tiie data line is stored 
in tiie prefetch buffer, reading the data line from the prefetch buffer, 
and transmitting the data line to the main processor, then proceeding 
to step (f); 

(e) if the main cache hit signal indicates the data line is not 
stored in the main cache, then fetching the data line from the main 
memory, transmitting the data line to the main processor, and 
updating the prefetch assist cache; 
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(£) sensing a prefetch assist cache hit signal from the prefetch 
assist cache, the prefetch assist cache hit signal indicating whether a 
predicted address corresponding to the address is stored in the prefetch 
assist cache; 

(g) if the prefetch assist cache hit signal indicates tiie predicted 
address corresponding to flie address is not stored in the prefetch assist 
cache^ fetching a next sequential data line from the main memory and 
storing flie next sequential data line in the prefetch buffer; 

(h) if the prefetch assist cache hit signal indicates the predicted 
address corresponding to tiie address is stored in the prefetch buffer, 
fetching a next data line corresponding to the predicted address from 
flie main memory and storing flie next data line in the prefetch buffer. 

5. The method of daim 4, wherein fetching the data line from the 
main memory and updating the prefetch assist cache in step (e) 
comprises &e steps of: 

(a) transmitting the address to the main memory; 

(b) receiving the data line from the main memory, storing the 
data line in the main cache, and transmitting the data line to the main 
processor; 

(c) storing the address in the prefetch assist cache such that tiie 
address is selected by a last address received from tiie main processor. 

6. The mettiod of claim 4, further comprising tiie steps of: 

(i) receiving a program cotmter address, a virtual address, and a 
load instruction control signal from the main processor; 

(j> storing the program coimfer address and the virtual address if 
the load instruction control signal is received from the main processor. 
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such tiiat a last "n" program counter addresses and virtual addresses 
are stored; 

(k) comparing the program counter address to each of the last 
"n" program coimter addresses; 

(1) if tiie program coxmter address equals one of the last V* 
program coimter addresses, then generating a predicted virtual address 
by multiplying a corresponding stored virtual address by 2, and adding 
the virtual address; 

(m) if the program counter address equals one of the last "n" 
program counter addresses, fetching the next data line corresponding 
to the predicted virtual address from the main memory and storing the 
next data line in the prefetch buffer. 

7. An apparatus for minimizing memory access time in a 
computer S)rstem, comprising: 

means for recdving a read request from a main processor and an 
address corresponding to a data line stored in the main memory; 

means for transferring the address to a prrfetch buffer and a 
main cache; 

means for sensing a prefetch buffer hit signal from the prefetch 
buffer, flie prefetch buffer hit signal indicating whether the data line 
corresponding to the address is stored in the prefetch buffer; 

means for reading the data line from the prefetch buffer and 
transmitting the data line to the main processor if the prefetch buffer 
hit signal indicates the data line is stored in the prefetch buffer; 

means for sensing a main cache hit signal from the main cache, 
the main cache hit signal indicating whether the data line 
corresponding to the address is stored in the main cache; 
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means for fetching flie data line from the main memory if the 
main cache hit signal indicates the data line is not stored in the main 
cache; 

means for reading the data line from the main cache and 
transmitting the data line to the main processor; 

means for fetching a next sequential data line from the main 
memory and storing the next sequential data line in the prefetch 
buffer. 

means for generating a predicted address, fetching a next data 
line corresponding to the predicted address from the main memory, 
and storing the next data line in the prefetch buffer. 
8. The apparatus of claim 7, fiirtiier comprising: 

means for transferring the address to a prefetch assist cache; 

means for sensing a prefetch assist cache hit signal from the 
prefetch assist cache, the prefetch assist cache hit signal indicating 
whether a predicted address corresponding to the address is stored in 
flie prefetch assist cache; 

means for fetching the next sequential data line from the main 
memory and storing the next sequential data line in the prefetch buffer 
if the prefetch assist cache hit signal indicates tfie predicted address 
corresponding to the address is not stored in tiie prefetch assist cache; 

means for fetching a next data line corresponding to the 
predicted address from flie main memory and storing ttie next data 
line in flie prefetch buffer if the prefetch assist cache hit signal indicates 
the predicted address corresponding to the addr^s is stored in the 
prefetch buffer; 
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means for storing the address in the prefetch assist cache such 
that the address is selected by a last address received from the main 
processor. 

9. The apparatus of claim 7, further comprising: 

means for receiving a program counter address, a virtual 
address, and a load instruction control signal from the main processor; 

means for storing the program counter address and ttie virtual 
address if the load instruction control signal is received from the main 
processor, such that a last "n" program counter addresses and virtual 
addresses are stored; 

meaits for comparing the program coxmter address to each of the 
last V program counter addresses; 

means for goxerating a predicted virtual address by multiplying 
a corresponding stored virtual address by 2, and adding the virtual 
address, if the program counter address equals one of the last '"n" 
program counter addresses; 

means for fetching the next data line corresponding to the 
predicted virtual address from the main memory and storing the next 
data line in the prefetch buffer, if the program counter address equals 
one of the last "n" program cotmter addresses. 
10- A cache subsystem, comprising: 

contirol means coupled to receive an instruction prefetch hit 
signal, a data prefetch hit signal, a main cache hit signal, the control 
means coupled to receive a processor control signal from a main 
processor and coupled to receive a memory control signal from a main 
memory, the control means generating a fill instruction -signal, a fill 
data signal, a fill cache signal, and a next address select signal; 
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instruction prefetch means coupled to receive an address and 
the fill instruction signal, the instruction prefetch means storing a 
plurality of prefetched instruction lines, each of the prefetched 
instruction lines having a corresponding tag, the instruction prefetch 
means generating the instruction hit signal if the address corresponds 
to tiie tag of one of ttie prefetched instruction lines, the instruction 
prefetch means coupled to transmit and receive the prefetched 
instructions over a data bus; 

data prefetch means coupled to receive the address and the fill 
data signal, the data prefetch means storing a plurality of prefetched 
data lines, each of the prefetched data lines having a corresponding tag, 
ttie data prefetch means generating ttie data hit signal if the address 
corresponds to the tag of one of the prefetched data lines, the data 
prefetch means coupled to transmit and receive the prefetched data 
lines over the data bus; 

main cache means coupled to receive the address and the fill 
cache signal, tiie main cache means storing a plurality of unpredicted 
lines, each of the impredicted lines having a corresponding tag, the 
main cache means generating the cache hit signal if the address 
corresponds to die tag of one of the unpredicted lines, the main cache 
means coupled to transmit and receive the impredicted lines over the 
data bus; 

prediction means coupled to receive the address and the next 
address select signal, the prediction means generating a predicted 
address, the prediction means selectively transmitting the address and 
the predicted address to tfie main memory; 
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memory interface means cx)upled to tremsfer information 
between the main memory and the main processor, the instruction 
prefetch means, the data prefetch means, and the main cache means. 

11. The cache subsystem of daim 10, wherein the instruction 
prefetch means comprises fully associative buffer means having a 
plurality of line entries, the line entries having a data field for storing 
tiie prefetched instruction lines and a tag field for selecting the 
prefetched instruction lines. 

12. Thii cache subsystem of daim 10, wherein the data prefetch 
means comprises fully assodative buffer means having a plurality of 
line entries, the line entries having a data field for storing the 

pref etdied data lines and a tag field for selecting the prefetched data 
lines. 

13. The cadie subsystem of daim 10, wherein Hxe cache means 
comprises set assodative cadie means having a plurality of line 
entries, the line entries having a data field for storing the unpredicted 
lines and a tag field for selecting the unpredicted lines. 

14. The cache subsystem of daim 10, wherein the prediction means 
comprises: 

increment register means coupled to receive the address, the 
increment register means generating a next sequential address by 
incrementing the address; 

multiplexer means coupled to receive the address, the next 
sequential address, and die next address select signal, \he multiplexer 
means selectively transmitting the address and tiie next sequential 
address to the main memory imder control of the next address select 
signal. 
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15. The cache subsystem of claim 10, wherein the prediction means 
comprises: 

increment register means coupled to receive the address the 
incronent register means generating a next sequential address by 
incrementing the address; 

prefetch assist means coupled to receive the address and coupled 
to receive a create PAC entry signal from the control means, the 
prefetch assist means storing a plurality of predicted addresses, each of 
tiie predicted addresses having a corresponding tag, the prefetch assist 
means generating a PAC hit signal if the address corresponds to the tag 
of one of the predicted addresses, the prefetch assist means coupled to 
transmit a next predicted address corresponding to the address if flie 
PAC hit signal is generated; 

multiplexer means coupled to receive &e address, the next 
sequential address, the next predicted address, and the next address 
select signal, the multiplexer means selectively transmitting the 
address, ttie next sequential address, and ttie next predicted address to 
the main memory imder control of the next address select signal. 
16. The cache subsystem of claim 15, wherein the prefetch assist 
means comprises: 

set associative cache means having a plurality of line entries, the 
line entries having a data field for storing ftie predicted addresses and a 
tag field for selecting the predicted addresses; 

first register means coupled to receive the address and store the 
address if the processor control signal indicates an instruction read 
cyde; 
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second register means coupled to receive the address and store 
the address if processor control signal indicates a data read cycle. 
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